Dataset statistics
| Dataset A | Dataset B | |
|---|---|---|
| Number of variables | 12 | 12 |
| Number of observations | 446 | 446 |
| Missing cells | 443 | 437 |
| Missing cells (%) | 8.3% | 8.2% |
| Duplicate rows | 0 | 0 |
| Duplicate rows (%) | 0.0% | 0.0% |
| Total size in memory | 45.3 KiB | 45.3 KiB |
| Average record size in memory | 104.0 B | 104.0 B |
Variable types
| Dataset A | Dataset B | |
|---|---|---|
| Numeric | 5 | 5 |
| Categorical | 4 | 4 |
| Text | 3 | 3 |
| Dataset A | Dataset B | |
|---|---|---|
Age has 91 (20.4%) missing values | Age has 91 (20.4%) missing values | Missing |
Cabin has 351 (78.7%) missing values | Cabin has 345 (77.4%) missing values | Missing |
PassengerId has unique values | PassengerId has unique values | Unique |
Name has unique values | Name has unique values | Unique |
SibSp has 297 (66.6%) zeros | SibSp has 302 (67.7%) zeros | Zeros |
Parch has 345 (77.4%) zeros | Parch has 338 (75.8%) zeros | Zeros |
Fare has 12 (2.7%) zeros | Alert not present in this dataset | Zeros |
Reproduction
| Dataset A | Dataset B | |
|---|---|---|
| Analysis started | 2023-12-05 15:50:46.515326 | 2023-12-05 15:50:50.896896 |
| Analysis finished | 2023-12-05 15:50:50.895633 | 2023-12-05 15:50:54.741189 |
| Duration | 4.38 seconds | 3.84 seconds |
| Software version | ydata-profiling v0.0.dev0 | ydata-profiling v0.0.dev0 |
| Download configuration | config.json | config.json |
PassengerId
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 446 | 446 |
| Distinct (%) | 100.0% | 100.0% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 432.40135 | 448.72422 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 1 | 6 |
| Maximum | 889 | 891 |
| Zeros | 0 | 0 |
| Zeros (%) | 0.0% | 0.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 1 | 6 |
| 5-th percentile | 37.25 | 52.25 |
| Q1 | 203.75 | 229.5 |
| median | 432.5 | 441.5 |
| Q3 | 666.75 | 673.5 |
| 95-th percentile | 832.5 | 849.75 |
| Maximum | 889 | 891 |
| Range | 888 | 885 |
| Interquartile range (IQR) | 463 | 444 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 259.54475 | 257.57268 |
| Coefficient of variation (CV) | 0.60024038 | 0.57401111 |
| Kurtosis | -1.243708 | -1.2069199 |
| Mean | 432.40135 | 448.72422 |
| Median Absolute Deviation (MAD) | 232.5 | 224.5 |
| Skewness | 0.010447404 | 0.024528039 |
| Sum | 192851 | 200131 |
| Variance | 67363.477 | 66343.688 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 512 | 1 | 0.2% |
| 245 | 1 | 0.2% |
| 399 | 1 | 0.2% |
| 753 | 1 | 0.2% |
| 150 | 1 | 0.2% |
| 547 | 1 | 0.2% |
| 503 | 1 | 0.2% |
| 475 | 1 | 0.2% |
| 109 | 1 | 0.2% |
| 517 | 1 | 0.2% |
| Other values (436) | 436 |
| Value | Count | Frequency (%) |
| 499 | 1 | 0.2% |
| 693 | 1 | 0.2% |
| 758 | 1 | 0.2% |
| 195 | 1 | 0.2% |
| 72 | 1 | 0.2% |
| 542 | 1 | 0.2% |
| 306 | 1 | 0.2% |
| 132 | 1 | 0.2% |
| 715 | 1 | 0.2% |
| 593 | 1 | 0.2% |
| Other values (436) | 436 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 9 | 1 | |
| 11 | 1 | |
| 15 | 1 | |
| 17 | 1 |
| Value | Count | Frequency (%) |
| 6 | 1 | |
| 9 | 1 | |
| 10 | 1 | |
| 12 | 1 | |
| 13 | 1 | |
| 14 | 1 | |
| 16 | 1 | |
| 17 | 1 | |
| 20 | 1 | |
| 22 | 1 |
| Value | Count | Frequency (%) |
| 6 | 1 | |
| 9 | 1 | |
| 10 | 1 | |
| 12 | 1 | |
| 13 | 1 | |
| 14 | 1 | |
| 16 | 1 | |
| 17 | 1 | |
| 20 | 1 | |
| 22 | 1 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 9 | 1 | |
| 11 | 1 | |
| 15 | 1 | |
| 17 | 1 |
Survived
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 2 | 2 |
| Distinct (%) | 0.4% | 0.4% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| 0 | |
|---|---|
| 1 |
| 0 | |
|---|---|
| 1 |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 446 | 446 |
| Distinct characters | 2 | 2 |
| Distinct categories | 1 | 1 ? |
| Distinct scripts | 1 | 1 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 0 | 0 |
| 2nd row | 0 | 0 |
| 3rd row | 0 | 0 |
| 4th row | 1 | 0 |
| 5th row | 1 | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 272 | |
| 1 | 174 |
| Value | Count | Frequency (%) |
| 0 | 278 | |
| 1 | 168 |
Length
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| 0 | 272 | |
| 1 | 174 |
| Value | Count | Frequency (%) |
| 0 | 278 | |
| 1 | 168 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 272 | |
| 1 | 174 |
| Value | Count | Frequency (%) |
| 0 | 278 | |
| 1 | 168 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 446 |
| Value | Count | Frequency (%) |
| Decimal Number | 446 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 272 | |
| 1 | 174 |
| Value | Count | Frequency (%) |
| 0 | 278 | |
| 1 | 168 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 446 |
| Value | Count | Frequency (%) |
| Common | 446 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 272 | |
| 1 | 174 |
| Value | Count | Frequency (%) |
| 0 | 278 | |
| 1 | 168 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 446 |
| Value | Count | Frequency (%) |
| ASCII | 446 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 272 | |
| 1 | 174 |
| Value | Count | Frequency (%) |
| 0 | 278 | |
| 1 | 168 |
Pclass
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 3 | 3 |
| Distinct (%) | 0.7% | 0.7% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| 3 | |
|---|---|
| 1 | |
| 2 |
| 3 | |
|---|---|
| 1 | |
| 2 |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 446 | 446 |
| Distinct characters | 3 | 3 |
| Distinct categories | 1 | 1 ? |
| Distinct scripts | 1 | 1 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 3 | 1 |
| 2nd row | 3 | 3 |
| 3rd row | 3 | 3 |
| 4th row | 3 | 3 |
| 5th row | 3 | 3 |
Common Values
| Value | Count | Frequency (%) |
| 3 | 263 | |
| 1 | 100 | 22.4% |
| 2 | 83 | 18.6% |
| Value | Count | Frequency (%) |
| 3 | 232 | |
| 1 | 112 | |
| 2 | 102 |
Length
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| 3 | 263 | |
| 1 | 100 | 22.4% |
| 2 | 83 | 18.6% |
| Value | Count | Frequency (%) |
| 3 | 232 | |
| 1 | 112 | |
| 2 | 102 |
Most occurring characters
| Value | Count | Frequency (%) |
| 3 | 263 | |
| 1 | 100 | 22.4% |
| 2 | 83 | 18.6% |
| Value | Count | Frequency (%) |
| 3 | 232 | |
| 1 | 112 | |
| 2 | 102 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 446 |
| Value | Count | Frequency (%) |
| Decimal Number | 446 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 3 | 263 | |
| 1 | 100 | 22.4% |
| 2 | 83 | 18.6% |
| Value | Count | Frequency (%) |
| 3 | 232 | |
| 1 | 112 | |
| 2 | 102 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 446 |
| Value | Count | Frequency (%) |
| Common | 446 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 3 | 263 | |
| 1 | 100 | 22.4% |
| 2 | 83 | 18.6% |
| Value | Count | Frequency (%) |
| 3 | 232 | |
| 1 | 112 | |
| 2 | 102 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 446 |
| Value | Count | Frequency (%) |
| ASCII | 446 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 3 | 263 | |
| 1 | 100 | 22.4% |
| 2 | 83 | 18.6% |
| Value | Count | Frequency (%) |
| 3 | 232 | |
| 1 | 112 | |
| 2 | 102 |
Name
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 446 | 446 |
| Distinct (%) | 100.0% | 100.0% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 82 | 82 |
| Median length | 52 | 49 |
| Mean length | 27.426009 | 26.746637 |
| Min length | 12 | 12 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 12232 | 11929 |
| Distinct characters | 60 | 59 |
| Distinct categories | 7 | 7 ? |
| Distinct scripts | 2 | 2 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 446 | 446 ? |
| Unique (%) | 100.0% | 100.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | Webber, Mr. James | Allison, Mrs. Hudson J C (Bessie Waldo Daniels) |
| 2nd row | Harknett, Miss. Alice Phoebe | Balkic, Mr. Cerin |
| 3rd row | Andersson, Miss. Sigrid Elisabeth | Boulos, Miss. Nourelain |
| 4th row | Dean, Master. Bertram Vere | Salonen, Mr. Johan Werner |
| 5th row | Dahl, Mr. Karl Edwart | O'Connell, Mr. Patrick D |
| Value | Count | Frequency (%) |
| mr | 259 | 14.0% |
| miss | 94 | 5.1% |
| mrs | 69 | 3.7% |
| william | 32 | 1.7% |
| john | 25 | 1.4% |
| henry | 18 | 1.0% |
| thomas | 16 | 0.9% |
| master | 15 | 0.8% |
| johan | 10 | 0.5% |
| frederick | 10 | 0.5% |
| Other values (892) | 1296 |
| Value | Count | Frequency (%) |
| mr | 266 | 14.8% |
| miss | 92 | 5.1% |
| mrs | 60 | 3.3% |
| william | 31 | 1.7% |
| john | 22 | 1.2% |
| master | 18 | 1.0% |
| henry | 18 | 1.0% |
| charles | 17 | 0.9% |
| james | 15 | 0.8% |
| george | 13 | 0.7% |
| Other values (870) | 1246 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1399 | 11.4% | |
| r | 986 | 8.1% |
| e | 878 | 7.2% |
| a | 853 | 7.0% |
| s | 670 | 5.5% |
| n | 652 | 5.3% |
| i | 643 | 5.3% |
| M | 570 | 4.7% |
| l | 541 | 4.4% |
| o | 508 | 4.2% |
| Other values (50) | 4532 |
| Value | Count | Frequency (%) |
| 1353 | 11.3% | |
| r | 995 | 8.3% |
| e | 885 | 7.4% |
| a | 807 | 6.8% |
| n | 663 | 5.6% |
| s | 645 | 5.4% |
| i | 641 | 5.4% |
| M | 551 | 4.6% |
| l | 517 | 4.3% |
| o | 496 | 4.2% |
| Other values (49) | 4376 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 7854 | |
| Uppercase Letter | 1858 | 15.2% |
| Space Separator | 1399 | 11.4% |
| Other Punctuation | 966 | 7.9% |
| Open Punctuation | 73 | 0.6% |
| Close Punctuation | 73 | 0.6% |
| Dash Punctuation | 9 | 0.1% |
| Value | Count | Frequency (%) |
| Lowercase Letter | 7689 | |
| Uppercase Letter | 1811 | 15.2% |
| Space Separator | 1353 | 11.3% |
| Other Punctuation | 944 | 7.9% |
| Close Punctuation | 63 | 0.5% |
| Open Punctuation | 63 | 0.5% |
| Dash Punctuation | 6 | 0.1% |
Most frequent character per category
Space Separator
| Value | Count | Frequency (%) |
| 1399 |
| Value | Count | Frequency (%) |
| 1353 |
Lowercase Letter
| Value | Count | Frequency (%) |
| r | 986 | |
| e | 878 | |
| a | 853 | |
| s | 670 | |
| n | 652 | |
| i | 643 | |
| l | 541 | 6.9% |
| o | 508 | 6.5% |
| t | 352 | 4.5% |
| h | 273 | 3.5% |
| Other values (16) | 1498 |
| Value | Count | Frequency (%) |
| r | 995 | |
| e | 885 | |
| a | 807 | |
| n | 663 | |
| s | 645 | |
| i | 641 | |
| l | 517 | 6.7% |
| o | 496 | 6.5% |
| t | 343 | 4.5% |
| h | 274 | 3.6% |
| Other values (16) | 1423 |
Uppercase Letter
| Value | Count | Frequency (%) |
| M | 570 | |
| A | 137 | 7.4% |
| J | 117 | 6.3% |
| H | 95 | 5.1% |
| S | 91 | 4.9% |
| E | 86 | 4.6% |
| B | 81 | 4.4% |
| C | 80 | 4.3% |
| W | 67 | 3.6% |
| P | 61 | 3.3% |
| Other values (15) | 473 |
| Value | Count | Frequency (%) |
| M | 551 | |
| A | 116 | 6.4% |
| H | 110 | 6.1% |
| J | 97 | 5.4% |
| C | 92 | 5.1% |
| E | 90 | 5.0% |
| S | 83 | 4.6% |
| R | 71 | 3.9% |
| B | 69 | 3.8% |
| L | 68 | 3.8% |
| Other values (14) | 464 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 447 | |
| , | 446 | |
| " | 64 | 6.6% |
| ' | 8 | 0.8% |
| / | 1 | 0.1% |
| Value | Count | Frequency (%) |
| . | 446 | |
| , | 446 | |
| " | 46 | 4.9% |
| ' | 5 | 0.5% |
| / | 1 | 0.1% |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 73 |
| Value | Count | Frequency (%) |
| ( | 63 |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 73 |
| Value | Count | Frequency (%) |
| ) | 63 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 9 |
| Value | Count | Frequency (%) |
| - | 6 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 9712 | |
| Common | 2520 | 20.6% |
| Value | Count | Frequency (%) |
| Latin | 9500 | |
| Common | 2429 | 20.4% |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 1399 | ||
| . | 447 | 17.7% |
| , | 446 | 17.7% |
| ( | 73 | 2.9% |
| ) | 73 | 2.9% |
| " | 64 | 2.5% |
| - | 9 | 0.4% |
| ' | 8 | 0.3% |
| / | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 1353 | ||
| . | 446 | 18.4% |
| , | 446 | 18.4% |
| ) | 63 | 2.6% |
| ( | 63 | 2.6% |
| " | 46 | 1.9% |
| - | 6 | 0.2% |
| ' | 5 | 0.2% |
| / | 1 | < 0.1% |
Latin
| Value | Count | Frequency (%) |
| r | 986 | 10.2% |
| e | 878 | 9.0% |
| a | 853 | 8.8% |
| s | 670 | 6.9% |
| n | 652 | 6.7% |
| i | 643 | 6.6% |
| M | 570 | 5.9% |
| l | 541 | 5.6% |
| o | 508 | 5.2% |
| t | 352 | 3.6% |
| Other values (41) | 3059 |
| Value | Count | Frequency (%) |
| r | 995 | 10.5% |
| e | 885 | 9.3% |
| a | 807 | 8.5% |
| n | 663 | 7.0% |
| s | 645 | 6.8% |
| i | 641 | 6.7% |
| M | 551 | 5.8% |
| l | 517 | 5.4% |
| o | 496 | 5.2% |
| t | 343 | 3.6% |
| Other values (40) | 2957 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 12232 |
| Value | Count | Frequency (%) |
| ASCII | 11929 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 1399 | 11.4% | |
| r | 986 | 8.1% |
| e | 878 | 7.2% |
| a | 853 | 7.0% |
| s | 670 | 5.5% |
| n | 652 | 5.3% |
| i | 643 | 5.3% |
| M | 570 | 4.7% |
| l | 541 | 4.4% |
| o | 508 | 4.2% |
| Other values (50) | 4532 |
| Value | Count | Frequency (%) |
| 1353 | 11.3% | |
| r | 995 | 8.3% |
| e | 885 | 7.4% |
| a | 807 | 6.8% |
| n | 663 | 5.6% |
| s | 645 | 5.4% |
| i | 641 | 5.4% |
| M | 551 | 4.6% |
| l | 517 | 4.3% |
| o | 496 | 4.2% |
| Other values (49) | 4376 |
Sex
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 2 | 2 |
| Distinct (%) | 0.4% | 0.4% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| male | |
|---|---|
| female |
| male | |
|---|---|
| female |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 6 | 6 |
| Median length | 4 | 4 |
| Mean length | 4.7264574 | 4.6860987 |
| Min length | 4 | 4 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 2108 | 2090 |
| Distinct characters | 5 | 5 |
| Distinct categories | 1 | 1 ? |
| Distinct scripts | 1 | 1 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | male | female |
| 2nd row | female | male |
| 3rd row | female | female |
| 4th row | male | male |
| 5th row | male | male |
Common Values
| Value | Count | Frequency (%) |
| male | 284 | |
| female | 162 |
| Value | Count | Frequency (%) |
| male | 293 | |
| female | 153 |
Length
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| male | 284 | |
| female | 162 |
| Value | Count | Frequency (%) |
| male | 293 | |
| female | 153 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 608 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 162 | 7.7% |
| Value | Count | Frequency (%) |
| e | 599 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 153 | 7.3% |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 2108 |
| Value | Count | Frequency (%) |
| Lowercase Letter | 2090 |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 608 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 162 | 7.7% |
| Value | Count | Frequency (%) |
| e | 599 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 153 | 7.3% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 2108 |
| Value | Count | Frequency (%) |
| Latin | 2090 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 608 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 162 | 7.7% |
| Value | Count | Frequency (%) |
| e | 599 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 153 | 7.3% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 2108 |
| Value | Count | Frequency (%) |
| ASCII | 2090 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 608 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 162 | 7.7% |
| Value | Count | Frequency (%) |
| e | 599 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 153 | 7.3% |
Age
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 72 | 76 |
| Distinct (%) | 20.3% | 21.4% |
| Missing | 91 | 91 |
| Missing (%) | 20.4% | 20.4% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 29.284986 | 29.341324 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0.42 | 0.42 |
| Maximum | 74 | 71 |
| Zeros | 0 | 0 |
| Zeros (%) | 0.0% | 0.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0.42 | 0.42 |
| 5-th percentile | 4.7 | 3.7 |
| Q1 | 19.5 | 20 |
| median | 28 | 28 |
| Q3 | 37.5 | 39 |
| 95-th percentile | 54 | 54 |
| Maximum | 74 | 71 |
| Range | 73.58 | 70.58 |
| Interquartile range (IQR) | 18 | 19 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 14.16599 | 14.453535 |
| Coefficient of variation (CV) | 0.48372876 | 0.49259997 |
| Kurtosis | 0.15303096 | -0.24400782 |
| Mean | 29.284986 | 29.341324 |
| Median Absolute Deviation (MAD) | 9 | 9 |
| Skewness | 0.40395706 | 0.21125129 |
| Sum | 10396.17 | 10416.17 |
| Variance | 200.67527 | 208.90468 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 19 | 16 | 3.6% |
| 24 | 15 | 3.4% |
| 36 | 14 | 3.1% |
| 28 | 14 | 3.1% |
| 16 | 12 | 2.7% |
| 22 | 12 | 2.7% |
| 18 | 12 | 2.7% |
| 25 | 12 | 2.7% |
| 26 | 12 | 2.7% |
| 32 | 11 | 2.5% |
| Other values (62) | 225 | |
| (Missing) | 91 |
| Value | Count | Frequency (%) |
| 25 | 13 | 2.9% |
| 21 | 13 | 2.9% |
| 24 | 13 | 2.9% |
| 18 | 12 | 2.7% |
| 30 | 12 | 2.7% |
| 22 | 12 | 2.7% |
| 19 | 12 | 2.7% |
| 20 | 12 | 2.7% |
| 16 | 11 | 2.5% |
| 31 | 10 | 2.2% |
| Other values (66) | 235 | |
| (Missing) | 91 | 20.4% |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.2% |
| 0.75 | 1 | 0.2% |
| 1 | 3 | |
| 2 | 6 | |
| 3 | 4 | |
| 4 | 3 | |
| 5 | 3 | |
| 6 | 1 | 0.2% |
| 7 | 1 | 0.2% |
| 8 | 3 |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.2% |
| 0.67 | 1 | 0.2% |
| 0.75 | 2 | 0.4% |
| 0.83 | 2 | 0.4% |
| 0.92 | 1 | 0.2% |
| 1 | 2 | 0.4% |
| 2 | 6 | |
| 3 | 3 | |
| 4 | 3 | |
| 5 | 3 |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.2% |
| 0.67 | 1 | 0.2% |
| 0.75 | 2 | 0.4% |
| 0.83 | 2 | 0.4% |
| 0.92 | 1 | 0.2% |
| 1 | 2 | 0.4% |
| 2 | 6 | |
| 3 | 3 | |
| 4 | 3 | |
| 5 | 3 |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.2% |
| 0.75 | 1 | 0.2% |
| 1 | 3 | |
| 2 | 6 | |
| 3 | 4 | |
| 4 | 3 | |
| 5 | 3 | |
| 6 | 1 | 0.2% |
| 7 | 1 | 0.2% |
| 8 | 3 |
SibSp
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 7 | 7 |
| Distinct (%) | 1.6% | 1.6% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 0.5426009 | 0.56278027 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 8 | 8 |
| Zeros | 297 | 302 |
| Zeros (%) | 66.6% | 67.7% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 0 | 0 |
| Q1 | 0 | 0 |
| median | 0 | 0 |
| Q3 | 1 | 1 |
| 95-th percentile | 2 | 2.75 |
| Maximum | 8 | 8 |
| Range | 8 | 8 |
| Interquartile range (IQR) | 1 | 1 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 1.1224878 | 1.2102021 |
| Coefficient of variation (CV) | 2.0687172 | 2.1503989 |
| Kurtosis | 18.535871 | 16.771978 |
| Mean | 0.5426009 | 0.56278027 |
| Median Absolute Deviation (MAD) | 0 | 0 |
| Skewness | 3.7667746 | 3.6922183 |
| Sum | 242 | 251 |
| Variance | 1.2599788 | 1.4645891 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 297 | |
| 1 | 111 | 24.9% |
| 2 | 17 | 3.8% |
| 4 | 8 | 1.8% |
| 3 | 6 | 1.3% |
| 8 | 4 | 0.9% |
| 5 | 3 | 0.7% |
| Value | Count | Frequency (%) |
| 0 | 302 | |
| 1 | 103 | 23.1% |
| 2 | 18 | 4.0% |
| 4 | 8 | 1.8% |
| 3 | 5 | 1.1% |
| 5 | 5 | 1.1% |
| 8 | 5 | 1.1% |
| Value | Count | Frequency (%) |
| 0 | 297 | |
| 1 | 111 | 24.9% |
| 2 | 17 | 3.8% |
| 3 | 6 | 1.3% |
| 4 | 8 | 1.8% |
| 5 | 3 | 0.7% |
| 8 | 4 | 0.9% |
| Value | Count | Frequency (%) |
| 0 | 302 | |
| 1 | 103 | 23.1% |
| 2 | 18 | 4.0% |
| 3 | 5 | 1.1% |
| 4 | 8 | 1.8% |
| 5 | 5 | 1.1% |
| 8 | 5 | 1.1% |
| Value | Count | Frequency (%) |
| 0 | 302 | |
| 1 | 103 | 23.1% |
| 2 | 18 | 4.0% |
| 3 | 5 | 1.1% |
| 4 | 8 | 1.8% |
| 5 | 5 | 1.1% |
| 8 | 5 | 1.1% |
| Value | Count | Frequency (%) |
| 0 | 297 | |
| 1 | 111 | 24.9% |
| 2 | 17 | 3.8% |
| 3 | 6 | 1.3% |
| 4 | 8 | 1.8% |
| 5 | 3 | 0.7% |
| 8 | 4 | 0.9% |
Parch
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 7 | 7 |
| Distinct (%) | 1.6% | 1.6% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 0.367713 | 0.39013453 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 6 | 6 |
| Zeros | 345 | 338 |
| Zeros (%) | 77.4% | 75.8% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 0 | 0 |
| Q1 | 0 | 0 |
| median | 0 | 0 |
| Q3 | 0 | 0 |
| 95-th percentile | 2 | 2 |
| Maximum | 6 | 6 |
| Range | 6 | 6 |
| Interquartile range (IQR) | 0 | 0 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 0.82611187 | 0.82939885 |
| Coefficient of variation (CV) | 2.2466213 | 2.1259304 |
| Kurtosis | 12.319631 | 11.125881 |
| Mean | 0.367713 | 0.39013453 |
| Median Absolute Deviation (MAD) | 0 | 0 |
| Skewness | 3.0941863 | 2.8930729 |
| Sum | 164 | 174 |
| Variance | 0.68246083 | 0.68790245 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 345 | |
| 1 | 58 | 13.0% |
| 2 | 34 | 7.6% |
| 5 | 3 | 0.7% |
| 3 | 3 | 0.7% |
| 4 | 2 | 0.4% |
| 6 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 338 | |
| 1 | 60 | 13.5% |
| 2 | 40 | 9.0% |
| 3 | 3 | 0.7% |
| 5 | 3 | 0.7% |
| 6 | 1 | 0.2% |
| 4 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 345 | |
| 1 | 58 | 13.0% |
| 2 | 34 | 7.6% |
| 3 | 3 | 0.7% |
| 4 | 2 | 0.4% |
| 5 | 3 | 0.7% |
| 6 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 338 | |
| 1 | 60 | 13.5% |
| 2 | 40 | 9.0% |
| 3 | 3 | 0.7% |
| 4 | 1 | 0.2% |
| 5 | 3 | 0.7% |
| 6 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 338 | |
| 1 | 60 | 13.5% |
| 2 | 40 | 9.0% |
| 3 | 3 | 0.7% |
| 4 | 1 | 0.2% |
| 5 | 3 | 0.7% |
| 6 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 345 | |
| 1 | 58 | 13.0% |
| 2 | 34 | 7.6% |
| 3 | 3 | 0.7% |
| 4 | 2 | 0.4% |
| 5 | 3 | 0.7% |
| 6 | 1 | 0.2% |
Ticket
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 378 | 380 |
| Distinct (%) | 84.8% | 85.2% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 18 | 18 |
| Median length | 17 | 17 |
| Mean length | 6.5650224 | 6.6547085 |
| Min length | 3 | 3 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 2928 | 2968 |
| Distinct characters | 34 | 32 |
| Distinct categories | 5 | 5 ? |
| Distinct scripts | 2 | 2 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 326 | 339 ? |
| Unique (%) | 73.1% | 76.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | SOTON/OQ 3101316 | 113781 |
| 2nd row | W./C. 6609 | 349248 |
| 3rd row | 347082 | 2678 |
| 4th row | C.A. 2315 | 3101296 |
| 5th row | 7598 | 334912 |
| Value | Count | Frequency (%) |
| pc | 24 | 4.3% |
| c.a | 15 | 2.7% |
| ca | 9 | 1.6% |
| ston/o | 7 | 1.3% |
| 2 | 7 | 1.3% |
| a/5 | 7 | 1.3% |
| 2144 | 4 | 0.7% |
| a/4 | 4 | 0.7% |
| w./c | 4 | 0.7% |
| line | 4 | 0.7% |
| Other values (395) | 468 |
| Value | Count | Frequency (%) |
| pc | 32 | 5.7% |
| a/5 | 12 | 2.1% |
| ca | 11 | 1.9% |
| c.a | 9 | 1.6% |
| w./c | 7 | 1.2% |
| 2144 | 6 | 1.1% |
| 382652 | 5 | 0.9% |
| sc/paris | 5 | 0.9% |
| 2343 | 5 | 0.9% |
| 347082 | 5 | 0.9% |
| Other values (401) | 468 |
Most occurring characters
| Value | Count | Frequency (%) |
| 3 | 381 | |
| 1 | 337 | |
| 2 | 297 | |
| 4 | 248 | |
| 7 | 239 | |
| 6 | 206 | 7.0% |
| 0 | 201 | 6.9% |
| 5 | 191 | 6.5% |
| 9 | 165 | 5.6% |
| 8 | 138 | 4.7% |
| Other values (24) | 525 |
| Value | Count | Frequency (%) |
| 3 | 355 | |
| 1 | 344 | |
| 2 | 304 | |
| 4 | 251 | |
| 7 | 249 | |
| 6 | 210 | 7.1% |
| 0 | 191 | 6.4% |
| 5 | 184 | 6.2% |
| 9 | 161 | 5.4% |
| 8 | 145 | 4.9% |
| Other values (22) | 574 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 2403 | |
| Uppercase Letter | 278 | 9.5% |
| Other Punctuation | 128 | 4.4% |
| Space Separator | 107 | 3.7% |
| Lowercase Letter | 12 | 0.4% |
| Value | Count | Frequency (%) |
| Decimal Number | 2394 | |
| Uppercase Letter | 300 | 10.1% |
| Other Punctuation | 146 | 4.9% |
| Space Separator | 119 | 4.0% |
| Lowercase Letter | 9 | 0.3% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 3 | 381 | |
| 1 | 337 | |
| 2 | 297 | |
| 4 | 248 | |
| 7 | 239 | |
| 6 | 206 | |
| 0 | 201 | |
| 5 | 191 | |
| 9 | 165 | |
| 8 | 138 | 5.7% |
| Value | Count | Frequency (%) |
| 3 | 355 | |
| 1 | 344 | |
| 2 | 304 | |
| 4 | 251 | |
| 7 | 249 | |
| 6 | 210 | |
| 0 | 191 | |
| 5 | 184 | |
| 9 | 161 | |
| 8 | 145 |
Space Separator
| Value | Count | Frequency (%) |
| 107 |
| Value | Count | Frequency (%) |
| 119 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 87 | |
| / | 41 |
| Value | Count | Frequency (%) |
| . | 98 | |
| / | 48 |
Uppercase Letter
| Value | Count | Frequency (%) |
| C | 64 | |
| O | 43 | |
| A | 40 | |
| P | 40 | |
| S | 28 | |
| N | 20 | 7.2% |
| T | 16 | 5.8% |
| W | 6 | 2.2% |
| Q | 5 | 1.8% |
| E | 4 | 1.4% |
| Other values (5) | 12 | 4.3% |
| Value | Count | Frequency (%) |
| C | 84 | |
| P | 46 | |
| A | 44 | |
| O | 36 | |
| S | 32 | 10.7% |
| N | 12 | 4.0% |
| T | 11 | 3.7% |
| W | 11 | 3.7% |
| F | 6 | 2.0% |
| Q | 5 | 1.7% |
| Other values (5) | 13 | 4.3% |
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 3 | |
| s | 3 | |
| r | 2 | |
| i | 2 | |
| l | 1 | 8.3% |
| e | 1 | 8.3% |
| Value | Count | Frequency (%) |
| a | 3 | |
| i | 2 | |
| s | 2 | |
| r | 2 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 2638 | |
| Latin | 290 | 9.9% |
| Value | Count | Frequency (%) |
| Common | 2659 | |
| Latin | 309 | 10.4% |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 3 | 381 | |
| 1 | 337 | |
| 2 | 297 | |
| 4 | 248 | |
| 7 | 239 | |
| 6 | 206 | |
| 0 | 201 | |
| 5 | 191 | |
| 9 | 165 | |
| 8 | 138 | 5.2% |
| Other values (3) | 235 |
| Value | Count | Frequency (%) |
| 3 | 355 | |
| 1 | 344 | |
| 2 | 304 | |
| 4 | 251 | |
| 7 | 249 | |
| 6 | 210 | |
| 0 | 191 | |
| 5 | 184 | |
| 9 | 161 | |
| 8 | 145 | |
| Other values (3) | 265 |
Latin
| Value | Count | Frequency (%) |
| C | 64 | |
| O | 43 | |
| A | 40 | |
| P | 40 | |
| S | 28 | |
| N | 20 | 6.9% |
| T | 16 | 5.5% |
| W | 6 | 2.1% |
| Q | 5 | 1.7% |
| E | 4 | 1.4% |
| Other values (11) | 24 | 8.3% |
| Value | Count | Frequency (%) |
| C | 84 | |
| P | 46 | |
| A | 44 | |
| O | 36 | |
| S | 32 | 10.4% |
| N | 12 | 3.9% |
| T | 11 | 3.6% |
| W | 11 | 3.6% |
| F | 6 | 1.9% |
| Q | 5 | 1.6% |
| Other values (9) | 22 | 7.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 2928 |
| Value | Count | Frequency (%) |
| ASCII | 2968 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 3 | 381 | |
| 1 | 337 | |
| 2 | 297 | |
| 4 | 248 | |
| 7 | 239 | |
| 6 | 206 | 7.0% |
| 0 | 201 | 6.9% |
| 5 | 191 | 6.5% |
| 9 | 165 | 5.6% |
| 8 | 138 | 4.7% |
| Other values (24) | 525 |
| Value | Count | Frequency (%) |
| 3 | 355 | |
| 1 | 344 | |
| 2 | 304 | |
| 4 | 251 | |
| 7 | 249 | |
| 6 | 210 | 7.1% |
| 0 | 191 | 6.4% |
| 5 | 184 | 6.2% |
| 9 | 161 | 5.4% |
| 8 | 145 | 4.9% |
| Other values (22) | 574 |
Fare
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 170 | 184 |
| Distinct (%) | 38.1% | 41.3% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 28.683744 | 33.485911 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 512.3292 | 512.3292 |
| Zeros | 12 | 3 |
| Zeros (%) | 2.7% | 0.7% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 7.05105 | 7.2292 |
| Q1 | 7.8958 | 7.9031 |
| median | 13 | 15.0479 |
| Q3 | 29.125 | 31.359375 |
| 95-th percentile | 90 | 113.275 |
| Maximum | 512.3292 | 512.3292 |
| Range | 512.3292 | 512.3292 |
| Interquartile range (IQR) | 21.2292 | 23.456275 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 42.669032 | 50.731497 |
| Coefficient of variation (CV) | 1.4875684 | 1.5150102 |
| Kurtosis | 43.49269 | 37.783069 |
| Mean | 28.683744 | 33.485911 |
| Median Absolute Deviation (MAD) | 5.7708 | 7.76665 |
| Skewness | 5.2552911 | 5.069095 |
| Sum | 12792.95 | 14934.716 |
| Variance | 1820.6463 | 2573.6848 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 7.8958 | 23 | 5.2% |
| 8.05 | 20 | 4.5% |
| 13 | 19 | 4.3% |
| 7.75 | 17 | 3.8% |
| 26 | 14 | 3.1% |
| 10.5 | 12 | 2.7% |
| 0 | 12 | 2.7% |
| 7.925 | 11 | 2.5% |
| 7.225 | 9 | 2.0% |
| 7.2292 | 9 | 2.0% |
| Other values (160) | 300 |
| Value | Count | Frequency (%) |
| 7.8958 | 23 | 5.2% |
| 13 | 23 | 5.2% |
| 8.05 | 21 | 4.7% |
| 7.75 | 20 | 4.5% |
| 26 | 15 | 3.4% |
| 10.5 | 14 | 3.1% |
| 7.225 | 8 | 1.8% |
| 7.25 | 8 | 1.8% |
| 7.8542 | 8 | 1.8% |
| 7.775 | 7 | 1.6% |
| Other values (174) | 299 |
| Value | Count | Frequency (%) |
| 0 | 12 | |
| 4.0125 | 1 | 0.2% |
| 6.2375 | 1 | 0.2% |
| 6.45 | 1 | 0.2% |
| 6.4958 | 1 | 0.2% |
| 6.75 | 2 | 0.4% |
| 6.975 | 1 | 0.2% |
| 7.0458 | 1 | 0.2% |
| 7.05 | 3 | 0.7% |
| 7.0542 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 3 | 0.7% |
| 4.0125 | 1 | 0.2% |
| 6.2375 | 1 | 0.2% |
| 6.8583 | 1 | 0.2% |
| 6.95 | 1 | 0.2% |
| 7.05 | 3 | 0.7% |
| 7.0542 | 1 | 0.2% |
| 7.125 | 2 | 0.4% |
| 7.225 | 8 | |
| 7.2292 | 6 |
| Value | Count | Frequency (%) |
| 0 | 3 | 0.7% |
| 4.0125 | 1 | 0.2% |
| 6.2375 | 1 | 0.2% |
| 6.8583 | 1 | 0.2% |
| 6.95 | 1 | 0.2% |
| 7.05 | 3 | 0.7% |
| 7.0542 | 1 | 0.2% |
| 7.125 | 2 | 0.4% |
| 7.225 | 8 | |
| 7.2292 | 6 |
| Value | Count | Frequency (%) |
| 0 | 12 | |
| 4.0125 | 1 | 0.2% |
| 6.2375 | 1 | 0.2% |
| 6.45 | 1 | 0.2% |
| 6.4958 | 1 | 0.2% |
| 6.75 | 2 | 0.4% |
| 6.975 | 1 | 0.2% |
| 7.0458 | 1 | 0.2% |
| 7.05 | 3 | 0.7% |
| 7.0542 | 2 | 0.4% |
Cabin
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 78 | 85 |
| Distinct (%) | 82.1% | 84.2% |
| Missing | 351 | 345 |
| Missing (%) | 78.7% | 77.4% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 15 | 15 |
| Median length | 3 | 3 |
| Mean length | 3.3473684 | 3.5445545 |
| Min length | 1 | 1 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 318 | 358 |
| Distinct characters | 18 | 18 |
| Distinct categories | 3 | 3 ? |
| Distinct scripts | 2 | 2 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 64 | 72 ? |
| Unique (%) | 67.4% | 71.3% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | E63 | C22 C26 |
| 2nd row | B20 | B78 |
| 3rd row | C104 | E58 |
| 4th row | E44 | C126 |
| 5th row | B51 B53 B55 | C148 |
| Value | Count | Frequency (%) |
| g6 | 4 | 3.7% |
| f33 | 3 | 2.8% |
| e24 | 2 | 1.9% |
| e8 | 2 | 1.9% |
| c65 | 2 | 1.9% |
| c123 | 2 | 1.9% |
| c2 | 2 | 1.9% |
| c68 | 2 | 1.9% |
| d | 2 | 1.9% |
| d20 | 2 | 1.9% |
| Other values (80) | 84 |
| Value | Count | Frequency (%) |
| c22 | 3 | 2.6% |
| f33 | 3 | 2.6% |
| c26 | 3 | 2.6% |
| f | 3 | 2.6% |
| e101 | 3 | 2.6% |
| e8 | 2 | 1.7% |
| d36 | 2 | 1.7% |
| c52 | 2 | 1.7% |
| c68 | 2 | 1.7% |
| g73 | 2 | 1.7% |
| Other values (84) | 90 |
Most occurring characters
| Value | Count | Frequency (%) |
| C | 33 | |
| 3 | 29 | 9.1% |
| 6 | 29 | 9.1% |
| 2 | 27 | 8.5% |
| B | 25 | 7.9% |
| D | 22 | 6.9% |
| 5 | 21 | 6.6% |
| 1 | 20 | 6.3% |
| 8 | 17 | 5.3% |
| 0 | 16 | 5.0% |
| Other values (8) | 79 |
| Value | Count | Frequency (%) |
| C | 40 | |
| 2 | 35 | 9.8% |
| 1 | 32 | 8.9% |
| 3 | 31 | 8.7% |
| 6 | 28 | 7.8% |
| B | 25 | 7.0% |
| 8 | 23 | 6.4% |
| E | 19 | 5.3% |
| 4 | 18 | 5.0% |
| 7 | 17 | 4.7% |
| Other values (8) | 90 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 199 | |
| Uppercase Letter | 107 | |
| Space Separator | 12 | 3.8% |
| Value | Count | Frequency (%) |
| Decimal Number | 229 | |
| Uppercase Letter | 115 | |
| Space Separator | 14 | 3.9% |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| C | 33 | |
| B | 25 | |
| D | 22 | |
| E | 14 | |
| F | 7 | 6.5% |
| G | 4 | 3.7% |
| A | 2 | 1.9% |
| Value | Count | Frequency (%) |
| C | 40 | |
| B | 25 | |
| E | 19 | |
| D | 17 | |
| F | 7 | 6.1% |
| G | 4 | 3.5% |
| A | 3 | 2.6% |
Decimal Number
| Value | Count | Frequency (%) |
| 3 | 29 | |
| 6 | 29 | |
| 2 | 27 | |
| 5 | 21 | |
| 1 | 20 | |
| 8 | 17 | |
| 0 | 16 | |
| 7 | 15 | |
| 4 | 13 | |
| 9 | 12 |
| Value | Count | Frequency (%) |
| 2 | 35 | |
| 1 | 32 | |
| 3 | 31 | |
| 6 | 28 | |
| 8 | 23 | |
| 4 | 18 | |
| 7 | 17 | |
| 5 | 17 | |
| 0 | 14 | 6.1% |
| 9 | 14 | 6.1% |
Space Separator
| Value | Count | Frequency (%) |
| 12 |
| Value | Count | Frequency (%) |
| 14 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 211 | |
| Latin | 107 |
| Value | Count | Frequency (%) |
| Common | 243 | |
| Latin | 115 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| C | 33 | |
| B | 25 | |
| D | 22 | |
| E | 14 | |
| F | 7 | 6.5% |
| G | 4 | 3.7% |
| A | 2 | 1.9% |
| Value | Count | Frequency (%) |
| C | 40 | |
| B | 25 | |
| E | 19 | |
| D | 17 | |
| F | 7 | 6.1% |
| G | 4 | 3.5% |
| A | 3 | 2.6% |
Common
| Value | Count | Frequency (%) |
| 3 | 29 | |
| 6 | 29 | |
| 2 | 27 | |
| 5 | 21 | |
| 1 | 20 | |
| 8 | 17 | |
| 0 | 16 | |
| 7 | 15 | |
| 4 | 13 | |
| 9 | 12 |
| Value | Count | Frequency (%) |
| 2 | 35 | |
| 1 | 32 | |
| 3 | 31 | |
| 6 | 28 | |
| 8 | 23 | |
| 4 | 18 | |
| 7 | 17 | |
| 5 | 17 | |
| 0 | 14 | 5.8% |
| 14 | 5.8% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 318 |
| Value | Count | Frequency (%) |
| ASCII | 358 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| C | 33 | |
| 3 | 29 | 9.1% |
| 6 | 29 | 9.1% |
| 2 | 27 | 8.5% |
| B | 25 | 7.9% |
| D | 22 | 6.9% |
| 5 | 21 | 6.6% |
| 1 | 20 | 6.3% |
| 8 | 17 | 5.3% |
| 0 | 16 | 5.0% |
| Other values (8) | 79 |
| Value | Count | Frequency (%) |
| C | 40 | |
| 2 | 35 | 9.8% |
| 1 | 32 | 8.9% |
| 3 | 31 | 8.7% |
| 6 | 28 | 7.8% |
| B | 25 | 7.0% |
| 8 | 23 | 6.4% |
| E | 19 | 5.3% |
| 4 | 18 | 5.0% |
| 7 | 17 | 4.7% |
| Other values (8) | 90 |
Embarked
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 3 | 3 |
| Distinct (%) | 0.7% | 0.7% |
| Missing | 1 | 1 |
| Missing (%) | 0.2% | 0.2% |
| Memory size | 7.0 KiB | 7.0 KiB |
| S | |
|---|---|
| C | |
| Q |
| S | |
|---|---|
| C | |
| Q |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 445 | 445 |
| Distinct characters | 3 | 3 |
| Distinct categories | 1 | 1 ? |
| Distinct scripts | 1 | 1 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | S | S |
| 2nd row | S | S |
| 3rd row | S | C |
| 4th row | S | S |
| 5th row | S | Q |
Common Values
| Value | Count | Frequency (%) |
| S | 327 | |
| C | 79 | 17.7% |
| Q | 39 | 8.7% |
| (Missing) | 1 | 0.2% |
| Value | Count | Frequency (%) |
| S | 319 | |
| C | 81 | 18.2% |
| Q | 45 | 10.1% |
| (Missing) | 1 | 0.2% |
Length
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| s | 327 | |
| c | 79 | 17.8% |
| q | 39 | 8.8% |
| Value | Count | Frequency (%) |
| s | 319 | |
| c | 81 | 18.2% |
| q | 45 | 10.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| S | 327 | |
| C | 79 | 17.8% |
| Q | 39 | 8.8% |
| Value | Count | Frequency (%) |
| S | 319 | |
| C | 81 | 18.2% |
| Q | 45 | 10.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 445 |
| Value | Count | Frequency (%) |
| Uppercase Letter | 445 |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| S | 327 | |
| C | 79 | 17.8% |
| Q | 39 | 8.8% |
| Value | Count | Frequency (%) |
| S | 319 | |
| C | 81 | 18.2% |
| Q | 45 | 10.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 445 |
| Value | Count | Frequency (%) |
| Latin | 445 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| S | 327 | |
| C | 79 | 17.8% |
| Q | 39 | 8.8% |
| Value | Count | Frequency (%) |
| S | 319 | |
| C | 81 | 18.2% |
| Q | 45 | 10.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 445 |
| Value | Count | Frequency (%) |
| ASCII | 445 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| S | 327 | |
| C | 79 | 17.8% |
| Q | 39 | 8.8% |
| Value | Count | Frequency (%) |
| S | 319 | |
| C | 81 | 18.2% |
| Q | 45 | 10.1% |
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 511 | 512 | 0 | 3 | Webber, Mr. James | male | NaN | 0 | 0 | SOTON/OQ 3101316 | 8.0500 | NaN | S |
| 235 | 236 | 0 | 3 | Harknett, Miss. Alice Phoebe | female | NaN | 0 | 0 | W./C. 6609 | 7.5500 | NaN | S |
| 542 | 543 | 0 | 3 | Andersson, Miss. Sigrid Elisabeth | female | 11.0 | 4 | 2 | 347082 | 31.2750 | NaN | S |
| 788 | 789 | 1 | 3 | Dean, Master. Bertram Vere | male | 1.0 | 1 | 2 | C.A. 2315 | 20.5750 | NaN | S |
| 338 | 339 | 1 | 3 | Dahl, Mr. Karl Edwart | male | 45.0 | 0 | 0 | 7598 | 8.0500 | NaN | S |
| 112 | 113 | 0 | 3 | Barton, Mr. David John | male | 22.0 | 0 | 0 | 324669 | 8.0500 | NaN | S |
| 462 | 463 | 0 | 1 | Gee, Mr. Arthur H | male | 47.0 | 0 | 0 | 111320 | 38.5000 | E63 | S |
| 159 | 160 | 0 | 3 | Sage, Master. Thomas Henry | male | NaN | 8 | 2 | CA. 2343 | 69.5500 | NaN | S |
| 846 | 847 | 0 | 3 | Sage, Mr. Douglas Bullen | male | NaN | 8 | 2 | CA. 2343 | 69.5500 | NaN | S |
| 493 | 494 | 0 | 1 | Artagaveytia, Mr. Ramon | male | 71.0 | 0 | 0 | PC 17609 | 49.5042 | NaN | C |
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 498 | 499 | 0 | 1 | Allison, Mrs. Hudson J C (Bessie Waldo Daniels) | female | 25.0 | 1 | 2 | 113781 | 151.5500 | C22 C26 | S |
| 870 | 871 | 0 | 3 | Balkic, Mr. Cerin | male | 26.0 | 0 | 0 | 349248 | 7.8958 | NaN | S |
| 852 | 853 | 0 | 3 | Boulos, Miss. Nourelain | female | 9.0 | 1 | 1 | 2678 | 15.2458 | NaN | C |
| 528 | 529 | 0 | 3 | Salonen, Mr. Johan Werner | male | 39.0 | 0 | 0 | 3101296 | 7.9250 | NaN | S |
| 629 | 630 | 0 | 3 | O'Connell, Mr. Patrick D | male | NaN | 0 | 0 | 334912 | 7.7333 | NaN | Q |
| 31 | 32 | 1 | 1 | Spencer, Mrs. William Augustus (Marie Eugenie) | female | NaN | 1 | 0 | PC 17569 | 146.5208 | B78 | C |
| 662 | 663 | 0 | 1 | Colley, Mr. Edward Pomeroy | male | 47.0 | 0 | 0 | 5727 | 25.5875 | E58 | S |
| 828 | 829 | 1 | 3 | McCormack, Mr. Thomas Joseph | male | NaN | 0 | 0 | 367228 | 7.7500 | NaN | Q |
| 614 | 615 | 0 | 3 | Brocklebank, Mr. William Alfred | male | 35.0 | 0 | 0 | 364512 | 8.0500 | NaN | S |
| 526 | 527 | 1 | 2 | Ridsdale, Miss. Lucy | female | 50.0 | 0 | 0 | W./C. 14258 | 10.5000 | NaN | S |
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 667 | 668 | 0 | 3 | Rommetvedt, Mr. Knud Paust | male | NaN | 0 | 0 | 312993 | 7.7750 | NaN | S |
| 306 | 307 | 1 | 1 | Fleming, Miss. Margaret | female | NaN | 0 | 0 | 17421 | 110.8833 | NaN | C |
| 176 | 177 | 0 | 3 | Lefebre, Master. Henry Forbes | male | NaN | 3 | 1 | 4133 | 25.4667 | NaN | S |
| 603 | 604 | 0 | 3 | Torber, Mr. Ernst William | male | 44.0 | 0 | 0 | 364511 | 8.0500 | NaN | S |
| 64 | 65 | 0 | 1 | Stewart, Mr. Albert A | male | NaN | 0 | 0 | PC 17605 | 27.7208 | NaN | C |
| 139 | 140 | 0 | 1 | Giglio, Mr. Victor | male | 24.0 | 0 | 0 | PC 17593 | 79.2000 | B86 | C |
| 532 | 533 | 0 | 3 | Elias, Mr. Joseph Jr | male | 17.0 | 1 | 1 | 2690 | 7.2292 | NaN | C |
| 314 | 315 | 0 | 2 | Hart, Mr. Benjamin | male | 43.0 | 1 | 1 | F.C.C. 13529 | 26.2500 | NaN | S |
| 344 | 345 | 0 | 2 | Fox, Mr. Stanley Hubert | male | 36.0 | 0 | 0 | 229236 | 13.0000 | NaN | S |
| 340 | 341 | 1 | 2 | Navratil, Master. Edmond Roger | male | 2.0 | 1 | 1 | 230080 | 26.0000 | F2 | S |
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 55 | 56 | 1 | 1 | Woolner, Mr. Hugh | male | NaN | 0 | 0 | 19947 | 35.5000 | C52 | S |
| 272 | 273 | 1 | 2 | Mellinger, Mrs. (Elizabeth Anne Maidment) | female | 41.0 | 0 | 1 | 250644 | 19.5000 | NaN | S |
| 817 | 818 | 0 | 2 | Mallet, Mr. Albert | male | 31.0 | 1 | 1 | S.C./PARIS 2079 | 37.0042 | NaN | C |
| 490 | 491 | 0 | 3 | Hagland, Mr. Konrad Mathias Reiersen | male | NaN | 1 | 0 | 65304 | 19.9667 | NaN | S |
| 15 | 16 | 1 | 2 | Hewlett, Mrs. (Mary D Kingcome) | female | 55.0 | 0 | 0 | 248706 | 16.0000 | NaN | S |
| 109 | 110 | 1 | 3 | Moran, Miss. Bertha | female | NaN | 1 | 0 | 371110 | 24.1500 | NaN | Q |
| 138 | 139 | 0 | 3 | Osen, Mr. Olaf Elon | male | 16.0 | 0 | 0 | 7534 | 9.2167 | NaN | S |
| 569 | 570 | 1 | 3 | Jonsson, Mr. Carl | male | 32.0 | 0 | 0 | 350417 | 7.8542 | NaN | S |
| 677 | 678 | 1 | 3 | Turja, Miss. Anna Sofia | female | 18.0 | 0 | 0 | 4138 | 9.8417 | NaN | S |
| 753 | 754 | 0 | 3 | Jonkoff, Mr. Lalio | male | 23.0 | 0 | 0 | 349204 | 7.8958 | NaN | S |
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset does not contain duplicate rows. | |||||||||||||
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset does not contain duplicate rows. | |||||||||||||